Pitch synchronous autocorrelation vocoder



Oct. 29, 1963 E. E. DAVID, JR.. EI'AI. 3,109,070

PITCH 'SYNCHRONOUS AUTOCORRELATION VOCODER Filed Aug. 9, 1960 11 Sheets-Sheet 1 Afk ahv/ JR.

A T TORN Y E. E. By J./?. P/ERCE QMJ Oct. 29, 1983 E. E. DAVID, JR, EIAI.

PITCH SYNCHRONOUS AUTOCORRELATION VOCODER Filed Aug. 9, 1960 ll Sheets-Sheet 2 E. E. DAVID, JR.. ETAI. 3,109,070

PITCH SYNCHRONOUS AUTOCORRELATION VOCODER Oct. 29,1963

ll Sheets-Sheet 4 Filed Aug. 9, 1960 r. 0 m A M DPI.$ ER 5 W s R mw w A T70R32 Oct. 29, 1963 ll Sheets-Sheet 6 Filed Aug. 9, 1960 EDD v NM $600k D Y E. E. DAVID, JR., ETA]; PITCH SYNCHRONOUS AUTOCORRELATION VOCODER ll Sheets-Sheet 8 V R fiww ME L m w 1 VR T F M DPQ 0 EJ Q V, B

INVENTORS:

Get. 29, 1963 Filed Aug. 9, 1960 Oct. 29, 1963 E. E. DAVID, JR.; ETA]. 3,109,070

PITCH SYNCHRONOUS AUTOCORRELATION VOCODER ll Sheets-Sheet 10 Filed Aug. 9, 1960 Oct. 29, 1963 E. E. DAVID, JR.. EIAL 3,

PITCH SYNCHRONOUS AUTQCORRELATION VOCODER -Sheet 11 I ll Sh wu om #29 3:55;

Filed Aug. 9, 1960 A$ m am aim .5. E. DA wo, J/P. :5 1 R. PIERCE M595 qlwix United States Patent M York Filed Aug. 9, 1960, Ser. No. 48,423 12 Claims. (1. 179-4555) This invention relates to speech transmission systems, and particularly to the transmission of speech over narrow-frequency band channels in terms of autocorrelation functions.

Among speech coding systems for the conservation of transmission channel bandwidth, one of the best known is the so-called channel vocoder described in H. W. Dudley Patent 2,151,091, issued March 21, 1939. At the transmitter terminal of the channel vocoder, the amplitude spectrum of an incoming speech wave is divided into frequency bands by a group of band-pass filters, and the energy contained within each band is represented by the magnitude of a low-frequency control signal. After transmission over a reduced bandwidth channel toa receiver terminal, the control signals adjust the energies of frequency bands of an excitation spectrum generated at the synthesizer. The energy-adjusted frequency bands are then combined to produce synthetic speech.

Synthetic speech produced by the Dudley vocoder is distorted by the inherent limitations of the band-pass filters that it employs. For ideal reproduction of speech, the speech amplitude spectrum should be represented by a group of points; as practiced by the Dudley vocoder, however, the finite widths of the band-pass filters produce points that are in fact averages of many points within the spectral bands passed by the filters.

It is a specific object of this invention to eliminate the need for band-pass filters and to reduce distortion in synthesized speech by representing speech in terms of nearly ideal points on its autocorrelation functions.

In this invention, an incoming speech wave is first band-limited and then correlated with itself at the analyzer to obtain a number of samples of each period of the speech autocorrelation function. These samples of the autocorrelation function constitute a group of control signals. row-frequency band because the autocorrelation function changes very little from period to period; further, the entire group of control signals occupies a much smaller band of frequencies than does the original speech wave, thereby yielding a substantial saving in transmission channel bandwidth. At the synthesizer, the control signals adjust the amplitude of an excitation signal generated at the synthesizer, and a synthetic speech wave is reconstructed from the amplitude-adjusted excitation signal.

Another autocorrelation function representation of speech is obtained in the present invention by correlating the incoming speech wave with both itself and its Hilbert transform to obtain a number of autocorrelation samples composed in equal parts of samples of periods of the autocorrelation function and samples of periods of the ninety-degree phase-shifted autocorrelation function. These samples of the two autocorrelation functions also constitute a group of narrow-band control signals.

In order to reconstruct an accurate replica of the speech autocorrelation function at the synthesizer, the sampling theorem requires that the samples of each period of the autocorrelation function be spaced at the Nyquist interval. Since the number of samples per period depends upon the length of the fundamental period, which is the reciprocal of the variable fundamental pitch Each control signal occupies a relatively narwas? frequency of the original speech wave, variations in the fundamental pitch frequency must be accompanied by changes in the number of autocorrelation samples if dis-.

tortion-free speech is to be reproduced. Accordingly, it is a specific object of this invention to reproduce synthetic speech with accuracy by varying the number of autocorrelation samples in synchrony'with the instantaneous fundamental pitch frequency.

The number of autocorrelation samples is varied in synchrony with the instantaneous fundamental pitch frequency by first obtaining a fixed number of samples corresponding to the longest pitch period to be accurately reproduced. The fixed number of samples is then passed through a bank of gates that are set to close at various predetermined thresholds in response to a variable magnitude pitch control signal. As the talkers pitch changes, the pitch control signal changes the number of gates in the open condition, thereby varying the number of autocorrelation samples passed by the gates. The samples passed by the gates constitute narrow-band control signals from which distortion-free synthetic speech may be reproduced. 7

Transmission channel bandwidth is further conserved in this invention by taking advantage of the symmetry of autocorrelation function. This is achieved by obtaining samples of half periods of the symmetrical autocorrelation function at the analyzer, and by deriving from each control signal transmitted to the synthesizer two equal amplitude, symmetrically located points on each period of the reconstructed speech Wave. Similarly, half periods of the antisymmetrical ninety-degree phaseshifted autocorrelation function are sampled at the analyzer, and at the synthesizer, two equal amplitude, antisymmetrically located points on each period of the reconstructed wave are derived from each half-period sample. By'thus reducing the number of control signals by a factor of two, the bandwidth necessary to transmit the control signals is also reduced by a factor of two.

An inherent source of distortion in the representation of most speech sounds by autocorrelation functions lies in the quadratic character of the amplitude spectra of autocorrelation [functions generally, as compared with the amplitude spectra of the waves from which they are derived. It is a specific object of this invention to elimi mate the quadratic distortion inherent in the autocorrelation function representation of speech by performing a square-root-taking operation upon the amplitude spectrum of the speech autocorrelation function.

In this invention, the autocorrelation control signals are converted into signals representative of discrete values of the autocorrelation amplitude spectrum by-a network composed of an array of multiplier elements supplied with weighting signals. The control signals and the weighting signals are combined by the array of multipliers in accordance with a Fourier transformation to produce autocorrelation amplitude spectrum signals. The spectrum signals are then applied to a group of square-root- .taking circuits which produce rooted spectrum signals representing an amplitude spectrum of the same shape as the amplitude spectrum of the original speech Wave. Artificial speech free of spectrum squaring distortion is reproduced from the autocorrelation function counterparts of the rooted spectrum signals, the counterparts being generated by a second array of multipliers supplied with weighting signals in accordance with an inverse Fourier transformation.

An important feature of this invention is the variation of the magnitudes of the output signals of the multiplier arrays in synchrony with changes in the fundamental pitch frequency of the original speech wave. This synchronisrn" is required for accurate reproduction of speech, since Patented Get. as, was.

' apparatus of 3 both the autocorrelation function and its amplitude spectrum are functions of the variable fundamental pitch frequency. Synchronization is achieved by varying the magnitudes of the weighting signals in synchrony with the instantaneous fundamental pitch frequency.

The invention will be fully understood from the following detailed description of illustrative embodiments thereof take in connection with the appended drawings, in which:

FIGS. 1A, 1B, 1C, 1D, 1E, and IF are waveform diagrams of assistance in explaining the operation of the apparatus of this invention;

FIG. 16 is a schematic block diagram showing a complete speech transmission system based upon the principles of this invention;

FIG. 2A is a schematic block diagram showing apparatus for representing speech in terms of autocorrelation control signals;

FIG. 2B is a graph showing the variation of the threshold voltages of the gates of FIG. 2A with increasing delay times of the input signals to the gates;

FIG. 2C is a graph showing the variation of pitch control signal voltage from circuit PC2 of FIG. 2A as a function of the fundamental pitch frequency of the incoming speech wave;

FIG. 3 is a schematic block diagram showing apparatus for reconstructing artificial speech from autocorrelation control signals;

FIG. 4 is a schematic block diagram showing apparatus alternative to that of FIG. 2A;

FIG. 5 is a schematic block diagram showing apparatus for reconstructing artificial speech from autocorrelation control signals produced by the apparatus of FIG. 4;

FIG. 6A is a schematic block diagram showing apparatus for unsquaring the amplitude spectrum of the control signals produced by the apparatus of FIG. 2A;

FIG. 6B is a schematic block diagram showing ap panatus for generating autocorrelation control signals from the rooted spectrum signals produced by the apparatus of FIG. 6A;

FIG. 7A is a schematic block diagram showing apparatus for unsquaring the amplitude spectrum of the control signals produced by the apparatus of FIG. 4; and

FIGS. 7B and 7C are schematic block diagrams showing apparatus for generating autocorrelation control signals from the rooted spectrum signals produced by the FIG. 7A.

Mathematical Foundations A periodic speech wave g(t) with period T, fundamental frequency 1 firand bandwidth W may be expanded in a Fourier series where the coeflicients G(nf constitute the amplitude spectrum of g(t), and gb are the phase angles.

The autocorrelation function of g(t) is defined as M0 fun-gown (2) where g0('r) has the same period as g(t), and the variable 1- represents the amount of time by which the speech Wave is where and fn) fo) that is, @(nf is an even function, the amplitude spectrum of 50(7') is the square of the amplitude spectrum of g(t), and all of the phase angles of (r) are zero.

The Hilbert transform of g(t) is defined as MU E Wfo) Sin fd-Hs) and by correlating @(t) and g(t) in accordance with Equation 2, there is produced the ninety-degree phase-shifted speech autocorrelation function where Mr) has the same period as g(t). Substituting (1) and (4) in (5) and performing the indicated integration, Equation 5 becomes eo 3i nr0 sin fu (6) where '(ltf =@(tlf for nf 0 7 and eam-eon), for nro o (in) that is, @(nh) is an odd function with the same absolute amplitude as @(nf FIG. 1A shows several periods of a typical speech wave, and FIGS. 1B and 1C show several periods of its autocorrelation functions (-r) and @(r), respectively.

FIG. 1B shows the amplitude spectrum of a typical voiced sound, and FIG. 1E shows the amplitude spectrum of the corresponding speech autocorrelation function. A comparison of FIG. 1D with FIG. 1E reveals that squaring the amplitude spectrum as given in Equation 3b produces where T is the period and W is the bandwidth of both and M 0('r) and the cosine function from are mirror images of the half-period samples from O to I (rzf is uniquely determined by'either set of halfperiod samples; hence Equation 8. may be rewritten Since the half-period samples of both- The inverse of Equation 9, corresponding to Equation 3, is

The symmetry of the values of @(nf itself over the interval [-W,W] implies that I (nf is uniquely determined by its values over either of the half intervals, [W,()] or [0,W]. Considering the nonsymmetrical, one-sided spectrum sa Uzf E I (nf (lnf l'V, over the half interval [0,W], it may be decomposed into an even function (nf and an odd function UI From Equations 7 and 7a, Equation 12 may be written Expanding (nf in a Fourier series, the even part I ,,(nf gives rise to cosine terms whose coefficients are and the odd part l (nf gives rise to sine terms whose coefficients are Ha es) of0W 1 si nfthfiom a or, from symmetry considerations,

2 1 T a j wmnw tnow cos mag 5(? sin 21r1lf The speech correlation function corresponding to I (nf may also be expanded in a Fourier series has carp By definition, the coeiiicients of the sine and cosine terms in Equation 18 may be written in terms of the one-sided spectrum,

substituting (19) and (19a) into (18), the correlation function may be expressed in terms of the one-sided spectrum,

Complete 7 S peech Transmission System Referring first to FIG. 16, an incoming speech wave from source is applied in parallel to pitch circuit 81 and to speech autocorrelation iunction analyzer 82. Analyzer 82 derives samples of the speech autocorrelation function from the incoming speech Wave, and, under the control of a pitch signal from circuit 81, varies the number of samples in synchrony with the instantaneous fundamental pitch frequency of the speech wave. The details of source 80, circuit 81, and analyzer 82 are described below, as are the details of the other elements of FIG. 16.

The amplitude spectrum of the autocorrelation samples produced by analyzer S2 is unsquared by apparatus composed of weighting signal sources 83, Fourier transformation networks 84 and 86, and square-root-taldng circuits 85. A speech wave is reconstructed from the unsquared autocorrelation samples by speech synthesizer 88, utilizing an excitation signal from generator 87, and artificial speech is reproduced from the reconstructed speech wave by reproducer 89.

Pitch circuit 81 and analyzer 82 are located at a transmitter terminal, while generator 87 and synthesizer 88 are located at a receiver terminal. Elements 83, 84, 85, and 86, however, may be located at either the transmitter terminal or the receiver terminal.

Analyzer Referring now to FIG. 2A, an incoming speech Wave g(t) from source 20, for example, a microphone, is passed through low-pass filter 210, which may be of any well-known variety. Filter 210 serves to band-limit the speech wave by removing all frequency components of g(t) greater than W cycles per second, where, for ex-.

ample, the passband of filter 210 may be chosen so as to remove all frequency components greater than W=3,000 cycles per second.

The band-limited speech Wave output 01 filter 210 is applied simultaneously to a tapped delay line, for example, a tapped acoustic delay line 231, and to one of the input terminals of abanlr of multipliers, for example, a bank of well-known modulators M21, M22, MZN,

each provided with two input terminals and one output terminal.

Delay line 23 1, whieh is terminated in a matched impedance 211 to prevent reflection, is provided with N taps P21, P22, PZN, at which appear signals proportional to samples of the variously delayed speech wave, g(t1- g(t'r g(t1- where seconds,

The modulators develop at their output terminals signals proportional to the products g(t).g(tr g(t) -g(t g(t)-g(t1- which are then passed to a bank of averaging devices, for example, conventional low-pass filters F21, F22, F2N, each having a cutoff of 25 cycles per second. The magnitudes of the output signals of filters F21, F22, FZN are proportional to time averages of the product signals received from the bank of modulators; hence from Equation 2, it is seen that the magnitudes of these signals are proportional to samples of the autocorrelation function at various delay times,

ea rai ts) In addition, the incoming speech wave is correlated with itself to form a signal proportional to MD). This is achieved by applying the undelayed speech wave to both input terminals of modulator M29, and by passing the product output signal of M20 through low-pass filter F20.

As pointed out in FIG. 1B, the samples of 90(7) over each half period from i t Z" 2W 2 seconds delay are mirror images of the samples of ('r) over the half period from t .E 2W 0 2 seconds delay; hence 0(T) is uniquely represented by TW samples, in addition to 0(0), starting at seconds delay and ending at seconds delay, with a delay interval of seconds between successive samples. By reducing the number of control signals by a factor of two, the bandwidth necessary to transmit the control signals is also reduced by a factor of two. 7 For example, if p('r) has a period of T =10 milliseconds, then for a bandlimit of W=3,000 cycles per second, MT) is uniquely determined by providing delay line 231 with TW=30 taps, where the delay time at tap P21 is l =l 2W 6 millisecond, the delay time at tap P22 is millisecond, the delay time at tap P2N is milliseconds, and the delay interval between adjacent taps is millisecond.

Obtaining the correct number of samples of the speech autocorrelation function is complicated by variations in the period T of the speech autocorrelation function in synchrony with variations in the fundamental, pitch frequency,

of the original speech wave. In order to obtain the proper number of samples of each half period of the 8 speech autocorrelation function despite these variations, delay line 231 of FIG. 2A is provided with that number of taps needed to obtain at the output terminals of filters F20, F21, F22, FZN a sufiicient number of halfperiod autocorrelation samples of the longest period that it is desired to represent fully. Filters F20, F21, F22, FZN are then connected to conventional gates G20, G21, G22, GZN, respectively, and when shorter periods occur, excess samples occurring at points past the seconds or half-period sample point are blocked by closing the appropriate gates.

Gates G29, G21, G22, GZN are set to close at various predetermined thresholds in response to a pitch control signal from circuit PC2 whose magnitude varies in synchrony with the fundamental pitch frequency, as illustrated in FIG. 2C. Correspondingly, as shown in IG. 2B, the thresholds of the gates are set at progressively lower voltages as the delay times of the autocorrelation samples applied to them increase. The solid line in FIG. 2C indicates that the voltage of the pitch control signal applied to the gates increases as the pitch periods become shorter than the maximum period, and

the pitch control signal closes those gates that are supplied with autocorrelation samples occurring at delay times past the half-period sample point.

For example, if the longest pitch period that it is desired to represent fully is 10 milliseconds (corresponding to a deep bass voice of fundamental pitch frequency cycles per second), and if frequency components greater than W=3,000 cycles per second are removed by filter 210 of FIG. 2A, then in addition to g (0), 3O half-period autocorrelation samples are required to represent the period fully. By providing delay line 231 with 30 taps, and by providing 30 modulators and 30 filters (in addition to modulator M20 and filter F20 for (0)), the longest autocorrelation function periods are adequately sampled. To the output terminal of each filter there is connected a gate, and when shorter pitch periods occur, for example, a pitch period of 5 milliseconds (corresponding to a fundamental pitch frequency of 200 cycles per second), then only 15 half-period samples are required, besides 0(0), and the remaining 15 samples, which occur at delay times past the half-period sample point,

milliseconds, are eliminated by closing the 15 gates receiving the excess samples from their respective filters.

In order to derive the pitch control signal for gates G20, G21, G22, G2N, the band-limited speech wave output of filter 210 is applied simultaneously to pitch detector 212 and voiced-unvoiced detector 213 of circuit PC2. Pitch detector 212, which may be of any wellknown construction, develops at its output terminal a signal whose magnitude is directly proportional to the instantaneous fundamental pitch frequency, f of voiced sounds, as illustrated by the solid line in FIG. 2C. The output terminal of pitch detector 212 is connected to the control terminals of gates G20, G21, G22, GZN by relay 215 when voiced-unvoiced detector 213, of wellknown design, detects the presence of voiced sounds. When unvoiced sounds .are present, however, voiced-unvoiced detector 213 causes relay 215 to connect the out put terminal of energy source 216 to the control terminals of gates G29, G21, G22, G2N. Energy source 216 produces a signal of constant magnitude E, as shown by the dashed line in FIG. 2C, which permits sufficient autocorrelation function samples to be passed by gates G20,

G21, G22, GZN. to specify unvoiced sounds.

9 set of control signals that represent half periods of the speech autocorrelation function. It has been observed that the speech autocorrelation function changes very little from period to period, hence the variation of the control signals is very small. As a result, each control signals occupies a relatively narrow-frequency band, on the order of 25 cycles per second, and the entire group of control signals maybe transmitted, if desired, over a much narrower frequency band than is required for transmission of the original speech wave. It is to be understood, however, that the control signals may be transmitted by means of any well-known transmission method, for example, pulse-code modulation, to meet the requirements of a particular application of this invention.

The autocorrelation control signals transmitted to the synthesizer terminal must be supplemented with a signal that indicates whether the instantaneous speech sound is voiced or unvoiced, and if voiced, its fundamental pitch frequency. At the synthesizer, there is derived from this supplementary signal an excitation signal which is closely correlated with the vocal characteristics of the original speech wave, and the artificial speech reconstructed from the excitation signal thus preserves the vocal characteristics of the original speech. As shown in circuit PCZ of FIG. 2A, the output terminal of voiced-unvoiced detector 213 is connected to relay 214 so as to operate the relay when sounds detected by microphone 29 are voiced, and to disconnect the relay when the sounds are unvoiced.

'When the contacts of relay 214 are closed in response to the presence of voiced sounds, the output signal from pitch detector 212 serves as a supplementary pitch control signal to be transmitted together with the autocorrelation control signals.

Synthesizer are applied to the control terminals of a bank of modulators 13%, L31, L3N, respectively. The output terminal of modulator L30 is connected to the center tap R36 of delay line 341, which is terminated in a matched impedance 321 to prevent reflection. The output terminal of each of the other modulators L31, LSN is connected to delay line 341 via two taps which are disposed at equal intervals about center tap R39. Thus, for example, the output terminal of modulator'L-Iil is connected 'to taps R31, 131 spaced at equal second intervals about tap R39, respectively, and the output terminal of modulator L3N is connected to taps R3N,

r3N spaced at equal second intervals about tap R30, respectively.

The incoming control signals applied to the control terminals of the modulators adjust the amplitude of a vocal excitation signal supplied to the modulators from vocal excitation signal generator 39-1. The excitation signal is derived by generator 391 from the incoming pitch control signahavhich is first passed through an equalizing delay 31 to synchronize the pitch control signal lit with the control signals. Excitation generator 391, which may comprise conventional buzz and hiss sources, operates to provide an excitation signal to the modulators that is closely correlated with the voiced-unvoiced and fundamental pitch frequency characteristics of the original speech wave. Artificial speech reconstructed under the influence of the control signals from the excitation signal supplied by generator 391 thus preserves the naturalness as well as the intelligibility of the original speech wave.

The amplitude-adjusted excitation signals formed by the modulators'at taps R31, R3-N are symmetrically located with respect to the signals formed at taps r3 1, r3N, thereby duplicating the symmetry of 0( about the center of each period. The variously delayed signals appearing at the lower terminal of delay line 341 are smoothed to form an artificial wave by lowpass filter 34-2, proportioned to eliminate all frequencies greater than 3,000 cycles per second. The output terminal of filter 342. is connected to a conventional reproducer .343, which converts the artificially reconstructed electrical speech wave into audible and intelligible speech sounds.

Alternate Analyzer Alternative apparatus for representing speech in terms of the one-sided amplitude spectrum, I (nf is shown in FIG. 4. As given by Equation 17, i mf may be expressed in terms of half-period samples of both 1,0(7') and p(r), and the apparatus of FIG. 4 is based upon this relationship.

Referring now to FIG. 4, an incoming speech wave g(t) from microphone it? is band-limited by low-pass filter 414 to remove all frequencies greater than W cycles g(t--r g(t where 1 2 N T seconds, seconds, TN=W seconds are the various delay times corresponding to taps P41, P42, P4N, respectively.

The signals appearing at the taps of delay line 4-31 are passed to the second input points of modulators M41, M42, M4N, and the modulators develop at their outputterminals signals whose magnitudes are proportional to the products g(t)'g(t"r g(t)-g(tr g( t)-g(trn), respectively. The output terminals of these modulators are connected to averaging devices, for example, low-pass filters F41, F42, F4N, respec tively, which are constructed to pass frequencies equal to or less than 25 cycles per second, thereby forming output signals proportional to samples of the autocorrelation function at various delay times,

iii 4%?)- i in In addition, the incoming speech wave is correlated with itself to form a signal proportional to the MO) term in Equation 17. This is accomplished by applying the undelayed speech wave to both input terminals of modulator M40, and by passing the product output signal g(t) -g(t) of M40 through low-pass filter F40.

The band-limited speech wave output of filter 410 is also applied to ninety-degree phase-shifter 430 which develops a signal proportional to the Hilbert transform g0?) of the speech wave. The use of ninety-degree phaseshifter '430 to obtain the Hilbert transform of the speech wave is based uponthe well-known quadrature relationship between a function and its Hilbert transform, a proof ll" of which is found in S. Goldman, Information Theory, page 332 (1953). Since functions that are in quadrature with each other differ solely in'phase by ninety degrees, element 430 may be one of several equally suitable ninety-degree phase-shifting devices for obtaining the Hilbert transform 1 of the input signal g(t).

The Hilbert transform output of phase-shifter 430 is applied in parallel to one input point of modulators M41, M42, IQMN, and together with the delayed signals from taps P41, P42, P4N applied to the second input points of these modulators, there is formed at their output terminals signals whose magnitudes are proportional to values of the ninety-degree phase-shifted speech autocorrelation function,

at) e) s) in accordance with Equation 5.

From Equation 17, the one-sided amplitude spectrum is determined by TW (T half-period samples of (1-) and half-period samples of @(r). It is understood, however, that the half-period samples of do from i Z W 2 seconds delay are inverted images of the half-period samples from i J1 W 2 seconds delay.

As in the case of the representation of speech by samples of 1(1) alone, variations in the period T of both (p('r) and 93(1) make it necessary to vary the number of samples of both (p('r) and @(1) in synchrony with changes in the fundamental pitch frequency,

1 fit-" of the incoming speech wave. This is achieved by providing delay line 431 with a fixed number of taps sufiicient to sample the longest pitch period to be fully represented. For pitch periods shorter than the maximum to be represented fully, filters F40,'F4'1,F42, F4N, and F41,

F42, FQN are connected to gates G40, G41, G42, G4N, and G41, G42, G4N, respectively, which block excess samples occurring at points past the half-period sample point of both (-r) and (-r). The gates shown in FIG. 4 operate in the same fashion as the gates shown in FIG. 2A, being set to closeat various predetermined thresholds by a variable magnitude pitch control signal from circuit PC4. Circuit PC4, which is identical in both structure and operation to circuit PCZ of FIG. 2A, produces from the band-limited output Signal of filter 410 a pitch control signal whose magnitude is proportional to the instantaneous fundamental pitch frequency.

12 The autocorrelation samples passed by the gates constitute a set of control signals representing half periods of each of the speech autocorrelation functions, (7) and p(1-). Each of these signals occupies a relatively narrowfrequency range, on the order of 25 cycles per second, and the entire group of signals may be transmitted over a substantially narrower band of frequencies than that required for transmission of the original speech wave.

As in the case of speech representation by samples of p('r) alone, the control signals derived by the analyzer apparatus of FIG. 4 must be supplemented with a signal indicative of the voiced-unvoiced and fundamental pitch tfrequency characteristics of the instantaneous speech sound. This supplementary or pitch control signal is derived from the incoming speech wave from source 40 by circuit PC4. The pitch control signal and the autocorrelation control signals constitute the compressed frequency representation of speech from which artificial speech is reconstructed.

The autocorrelation control signals and the supplementary signal produced by the apparatus of FIG. 4 may be transmitted over a narrow-band channel by any suitable method, for example, by pulse-code modulation. After transmission, speech is reproduced from the signals by a synthesizer, a preferred embodiment of which is shown in FIG. 5.

Alternate Synthesizer The synthesizer shown in FIG. 5 simultaneously reconstructs samples of symmetrical periods from the halfperiod 0(7) control signals and samples of antisymline 541, terminated in a matched impedance 521 to prevent reflection. Each of the other control signals is applied simultaneously to the control terminals of two modulators whose output terminals are connected to delay line 541 through two taps disposed at equal intervals above and below center tap R50. Thus control signal for example, is applied to the control terminals of modulators L51, 151, the output terminals of which are connected to taps R51, r51, respectively, located at equal second intervals about center tap R50. 7

In addition, the output terminals of each pair of modulators receiving a sample of p(r) and of 8(7) at the same delay time are connected to the same tap of delay line 541, in accordance with Equation 16. Hence,

the output terminals of modulators L511, 151 are both connected to tap R51, and the output terminals of modulators [51, l51 are both connected to tap r51.

The control signals adjust the amplitude of an excitation signal supplied to the input terminals of the modulators from vocal excitation signal generator 591. Gen erator 591, which is preceded by equalizing delay 51, is identical in construction and operation to generator 391 of FIG. 3, deriving from the incoming pitch control Signal an excitation signal closely correlated with the voicedunvoiced and fundamental pitch frequency characteristics of the original speech wave.

The excitation signal is applied directly to the input terminals of all modulators supplied with @(7') control signals, thereby developing at their respective delay line 13 taps samples of a symmetrical wave duplicating the original symmetry of (p('r). For modulators supplied with 3(7) control signals, however, the excitation signal is applied directly to the input terminals of half of these modulators, for example, modulators 51, iSN, connected to taps above center tap RS'll, but is reversed in polarity by polarity inverter 571 before being applied to the input terminals of modulators Z Sl, iN, connected to taps below center tap R5l Thus the modulators supplied with (13(1) control signals develop at their respective taps samples of an antisymmetrical wave duplicating the original antisymmetry 3(7).

The reconstructed signal formed at the output terminal of delay line 541 is a linear combination of the amplitude-adjusted excitation signals formed at the various taps, and the information content of the spectrum of this reconstructed signal equals the information content of the spectrum of the original speech autocorrelation function. The reconstructed output signals of delay line 541 are smoothed by low-pass filter 542, having a passband of 0 to 3,000 cycles per second, to form a synthetic speech wave. Artificial speech is reconstructed from the output wave of filter 542 by reproducer 543, which may be of any well-known construction.

Unsquaring Apparatus The amplitude spectrum of artificial speech synthesized directly from the autocorrelation control signals produced by the analyzer apparatus shown in either FIG. 2A or FIG. 4 is the square of the amplitude spectrum of the original speech wave, in accordance with Equations 3b and 7. This squaring of the amplitude spectrum impairs the intelligibility of most artificial speech sounds reconstructed from the autocorrelation control signals, due to changes in relative spectral values. In order for synthetic speech to be free of the spectrum squaring defects inherent in the autocorrelation function representation of speech, the present invention subjects the control signals produced by the above analyzers to a square-root-taking operation prior to speech synthesis.

Apparatus for removing the square law dependence of the autocorrelation amplitude spectrum is shown in FIGS. 6A, 63, 7A, 7B, and 7C. The apparatus of FIGS. 6A and 6B is designed to operate upon the control signals produced by the analyzer shown in FIG. 2A, and the apparatus of FIGS. 7A, 7B, and 7C is designed to operate upon the control signals produced by the analyzer shown in FIG. 4. The apparatus shown in FIGS. 6A, 6B, 7A, 7B, and 7C may be located at either the analyzer station or the synthesizer station, or, if desired, the component parts may be conveniently divided between the two stations.

As shown in FIG. 6A, the incoming autocorrelation control signals derived from an analyzer of the type illustrated in FIG. 2A are applied to the input terminals of network 60. The function of network 69 is to transform the control signals into signals representative of the autocorrelation amplitude spectrum, in accordance with the Fourier transformation given by Equation 9. The amplitude spectrum signals developed at the output terminals of network 60 are then passed through a bank of rooters H61, H62, E6 which perform a square-root-taking operation, thereby producing a new set of signals representing an amplitude spectrum of the same shape as the amplitude spectrum of the original speech wave. The rooted spectrum signals are then passed through network 61 of FIG. 6B, which operates in accordance with the inverse Fourier transformation given by Equation 9a to produce a new set of autocorrelation control signals from which artificial speech free of spectrum squaring distortion may be reconstructed by a synthesizer of the type shown in FIG. 3.

Recalling Equation 9, the autocorrelation amplitude spectrum is derived from samples of the autocorrelation 14 function in the following fashion, ignoring the constant factor (2%) (2 ir ('21) From Equation 21 it is seen that signals proportional to discrete values of autocorrelation amplitude spectrum may be derived by various linear combinations of the autocorrelation control signals, after weighting the control signals by signals proportional to selected cosine functions. Referring to the apparatus shown in FIG. 6A, network 60 consists of an array of p.N multipliers m i=1, 2, 1, i=1, 2, N, of conventional construction, arranged in p rows and N columns. Each row of multipliers corresponds to one series in Equation 21, and each multiplier corresponds to one term containing a cosine factor in a particular series. Each multiplier is provided with two input terminals and one output terminal, and the multipliers in the jth column have one of their input terminals connected to a common input point, I to which point the jth autocorrelation control signal is applied. The second input terminals of the multipliers in each row are connected to the output terminals of one of the weighting signal sources S1, S2, Sp, each of which generates N weighting signals proportional to the N cosine factors in the particular series of Equation 21 to which each row of multipliers corresponds.

From the autocorrelation control signals and the weighting signals, the multipliers in each row develop at their output terminals weighted autocorrelation signals proportional to the individual weighted terms of a particular series of Equation 21. The output terminals of the multipliers in the ith row are connected to one input terminal of adder Ai, and the control signal (0), after a reduction in amplitude achieved by passing it through amplifier 62 with a gain constant of /2, is applied to the other input terminal of the ith adder. The linear combination of weighted control signals formed at the output terminal of each adder is proportional to a discrete value of the autocorrelation amplitude spectrum, in accordance with Equation 21.

It is observed in Equations 9 and 21 that theautocon relation amplitude spectrum is a cosine function of the fundamental pitch frequency, f of the original speech 'wave. The variable nature of the pitch frequency has been noted and discussed earlier in connection with the gating apparatus of FIG. 2A. In the present discussion of network 66-, the weighting signals supplied to the multipliers from sources S1, S2, Sp must vary with changes in f in order to obtain signals that accurately represent the amplitude spectrum. A signal that changes in synchrony with variations in f is a-lready'availa'ble in the form of the pitch control signal output of circuit P02 of FIG. 2A, and from this signal, after appropriate modification, sources S1, S2, Sp derive the various weighting signals utilized in network 60.

It is observed in Equation 21 that in addition to the variable f the arguments of the cosine weighting factors contain three constants, denoted It is necessary, therefore, that before generating the cosine weighting signals, the magnitude of the pitch control signal must be appropriately amplified in order to form the proper argument for the cosine signals.

Since the first constant appears in the aruginent of every weighting factor, the pitch control signal is first passed through amplifier 6, for example, a conventional voltage amplifier, having a gain Constant The second constant 12:1, 2, p is the same in the argument of each cosine factor appearing in a single series of (2d Since each source generates signals corre sponding to the cosine factors in a single series, each source S2, Sp (except S1, where b=l) is preceded by an amplifier K2, Kp having a gain constant 2, p, respectively, appropriate to the value of b for that source. Thus the output signal of amplifier 6 is applied in parallel to the input terminals of source S1 and amplifiers K2, Kp thereby producing at the input terminals of sources S1, 2, Sp signal proportional to respectively.

The third constant c=1, 2, N differs from termto-term within a single series of (21) and each source is therefore provided with a bank of N1 amplifiers Ki2, KiN, i=1, 2, p, with gain constants 2, N respectively, it being understood that no amplifier is necessary :for the constant 1.

In addition, each source contains a bank of N cosine function generators, of any well-known variety, each of which develops at its output terminal a signal whose magnitude is proportional to the cosine of the magnitude of the signal applied to its input terminal. For example, in the generation of weighting signals by source S2, the output signal of amplifier K2 is applied in parallel to cosine generator C21, for 0:1, and to amplifiers K22, KZN, having gain constants 2, N, respectively. Hence the signals appearing at the input terminals of cosine function generators 021, C22, CZN are proportional to respectively, and the signals developed at the output terminals of these generators are proportional to respectively.

The number of weighting signal sources utilized depends upon the frequency resolution desired in the amplitude spectrum of the reconstructed speech avave. For maximum resolution of voices having a fundamental pitch frequency of 100 cycles per second and a bandwith of 3,000

cycles per second, for example,

control signals to be Weighted. As previously discussed, the maxim-um number of control signals depends upon the longest period which it is desired to represent fully. Thus for full representation of 10 millisecond periods, a maximum of 30 control signals must be weighted and each Weighting signal source must contain 30 cosine function generators.

Having obtained signals proportional to discrete values of the autocorrelation amplitude spectrum at the output points of network 69, the signals are applied to a bank of rooters H61, H62, Hop, of any suitable variety. The rooters develop at their respective output points signals proportional to the square roots of the individual spectrum values,

(fo), fo), -1 (2 f0) From Equation 312, it is apparent that the rooters produce a group of signals representing an amplitude spectrum that is of the same shape as the amplitude spectrum of the original speech wave, thereby removing the spectrum squaring distortion inherent in the autocorrelation function representation of speech.

In order to reconstruct speech in a synthesizer of the type shown in FIG. 3, the rooted spectrum signals must 'be converted into their autocorrelation function counterparts. This is achieved by passing the rooted spectrum signals through the apparatus of FIG. 6B, which converts the rooted spectrum signals into autocorrelation samples in accordance with the inverse Fourier transformation of Equation 9a.

From Equation 9a, the autocorrelation function samples corresponding to the rooted spectrum signals are given by the following group of cosine series:

-t was w th) It is observed in Equation 22 that half-period samples of the .autocorrelation function (p'(r) corresponding to the rooted spestr-um I (nf may be derived from various linear combinations of rooted spectrum signals, after weighting the rooted spectrum signals by selected cosine function signals.

A comparison of Equations 21 and 22 reveals that the transformation of the rooted spectrum signals into autocorrelation signals may be accomplished by apparatus similar in structure to that shown in FIG. 6A. Referring now to FIG. 68, network 61 consists of an array of p-N multipliers m' i=1, 2, p, i=1, 2, N, arranged in p rows and N columns, each column of multipliers corresponding to one cosine series in Equation 2 2, and

each multiplier in each column corresponding to one term in a particular cosine series. Each multiplier is provided with two input terminals and one output terminal, and the multipliers in the ith row have one of their input terminals connected to a common input point, 1' to which point the ith rooted spectrum signal, I =(i is applied. The second input terminals of the multipliers in the ith row are supplied with weighting signals from the ith source S1 of FIG. 6A. The sources supply weighting signals proportional to the cosine factors in Equation 22, and the p multipliers in each column develop at their output terminals weighted spectrum signals proportional to the p individual cosine terms of a particular series of Equation 22. The output terminals of the multipliers in the fth column are connected to a common output point 0' and the linear combination of weighted spectrum signals formed at each common output point is proportional to a specific value of the corresponding autocorrelation function g0'(7') in accordance with Equation 22.

The value of 6(7) for 1:0 is derived in accordance with the first series of Equation 22 'by applying the rooted spectrum signals to the input terminals of an adder 64 of conventional construction, and the output signal developed by adder 64 is proportional to g0'(0).

Artificial speech is reconstructed from the auto-correlation signals produced by net-work 61 by applying them together with the pitch control signal from circuit P02 of the analyzer apparatus of FIG. 2A to a synthesizer such as that shown in FIG. 3 of this application. Artificial speech reconstructed from these signals has an amplitude spectrum of the same shape as the amplitude spectrum of the original speech wave; hence the artificial speech is free of the distortion caused by spectrum squaring.

FIGS. 7A, 7B, 7C illustrate apparatus for unsquaring the amplitude spectrum of control signals produced by an analyzer of the type shown in FIG. 4. The apparatus shown in FIGS. 7A, 7B, and 7C operates upon the same principles as that of FIGS. 6A and 6B to transform the incoming autocorrelation control signals into one-sided amplitude spectrum signals, to take the square root of the spectrum signals, and to convert the rooted spectrum signals into a new set of autocorrelation control signals.

Referring to Equation 17, the one-sided spectrum may be derived from the half-period samples of (p('r) and 97(7) by the following transformation, ignoring the constant factor It is apparent from Equation 23 that discrete values of the one-sided spectrum may be obtained by forming various linear combinations of the two sets of autocorrelation samples, after Weighting the q2(-'r) samples by selected cosine function signals and weighting the 3(7) samples by selected sine function signals.

Referring now to the apparatus shown in FIG. 7A, network 70 consists of an array of p.2N multipliers x y i=l,2, p, j=l,2, N, arranged in p rows and 2N columns, each row of multipliers corresponding to one series in Equation 23, and each multiplier in each row corresponding to one term in a particular series. Each multiplier is provided with two input terminals and one output terminal, and the multipliers x yij in the jth columns have one of their input terminals connected to common input points, I i to which points the jth autocorrelation signals (a A a W W respectively, are applied, in accordance with Equation 23. The second input terminals of the multipliers in each row are connected to the output terminals of one of the weighting signals source W1, W2, Wp. Each source generates N signals proportional to the cosine factors and N signals proportional to the sine factors in the particular series of Equation 23 to which each row of multipliers corresponds.

From the autocorrelation control signals and the weighting signals the multipliers in each row develop at their output terminals weighted autocorrelation signals proportional to the terms of a particular seriesof Equation 23. The output terminals of the multipliers in the ith row are connected to one input point of adder Bi, and the control signal (0), after a reduction in magnitude achieved by passing it through amplifier 72, having a gain constant of is applied to the other input terminal of the ith adder. The linear combination of weighted control signals formed at the output terminal of each adder is proportional to a discrete value of the. one-sided amplitude spectrum, in accordance with Equation 23.

It is noted in Equation 23 that the one-sided amplitude spectrum is also a function of the fundamental pitch frequency, f of the original speech. In order to reproduce the spectrum accurately, the weighting signals generated by sources W1, W2, Wp, must vary with changes in f Sources W1, W2, Wp derive the various weighting signals from the pitch control signal produced by circuit PC4- of FIG. 4, thereby producing weighting signals whose magnitudes vary in synchrony with variations in f It is observed in Equation 23, as in Equation 21, that in addition to the variable f the arguments of both the sine and cosine weighting factors contain three constants, denoted As in the weighting signal sources of the apparatus of FIG. 6A, the pitch control signal must be appropriately modified in order to form the proper argument for the weighting signals.

Since the first constant re I appears in the argument of every weighting factor, the

pitch control signal is first passed through amplifier 7, having a gain constant I 

1. IN A SYSTEM FOR THE NARROW-BAND TRANSMISSION OF SPEED, THE COMBINATION THAT COMPRISES A SOURCE OF AN INCOMING SPEED WAVE, MEANS FOR OBTAINING FROM SAID SPEECH WAVE A SIGNAL REPRESENTATATIVE OF THE INSTANTANEOUS PITCH CHARACTERISTIC OF SAID SPEED WAVE, MEANS FOR CORRELATING SAID SPEED WAVE WITH ITSELF TO PRODUCE SPEED AUTOCORRELATION FUNCTION SAMPLES, MEANS UNDER THE CONTROL OF SAID PITCH SIGNAL FOR GATING SAID SPEECH AUTOCORRELATION FUNCTION SAMPLES TO OBTAIN A GROUP OF NARROWBAND CONTROL SIGNALS WHICH VARY IN NUMBER IN SYNCHRONY WITH CHANGES IN SAID PITCH CHARACTERISTIC, MEANS FOR TRANSMITTING SAID PITCH SIGNAL AND SAID CONTROL SIGNALS TO A RECEIVER STATION, AND, AT SAID RECEIVER STATION, MEANS FOR RECONSTRUCTING AN ARTIFICIAL SPEED WAVE FROM SAID TRANSMITTED PITCH SIGNAL AND SAID TRANSMITTED CONTROL SIGNALS. 